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NOISE SUPPRESSION FOR A WIRELESS COMMUNICATION 

DEVICE 

BACKGROUND 

[ 1 00] The present invention relates generally to communication apparatus. More 
particularly, it relates to techniques for suppressing noise in a speech signal, and which 
may be used in a wireless or mobile communication device such as a cellular phone. 

[101] In many applications, a speech signal is received in the presence of noise, 
processed, and transmitted to a far-end party. One example of such a noisy environment 
is wireless application. For many conventional cellular phones, a microphone is placed 
near a speaking user's mouth and used to pick up speech signal. The microphone 
typically also picks up background noise, which degrades the quality of the speech signal 
transmitted to the far-end party. 

[ 1 02] Newer-generation wireless communication devices are designed with 
additional capabilities. Besides supporting voice communication, a user may be able to 
view text or browse World Wide Web page via a display on the wireless device. New 
videophone service requires the user to place the phone away, which therefore requires 
"far-field" speech pick-up. Moreover, "hands-free" communication is safer and provides 
more convenience, especially in an automobile. In any case, the microphone in the 
wireless device may be used in a "far-field" mode whereby it may be placed relatively far 
away from the speaking user (instead of being pressed against the user's ear and mouth). 
For far-field communication, less signal and more noise are received by the microphone, 
and a lower signal-to-noise ratio (SNR) is achieved, which typically leads to poor signal 
quality. 

[103] One common technique for suppressing noise is the spectral subtraction 
technique. In a typical implementation of this technique, speech plus noise is received via 
a single microphone and transformed into a number of frequency bins via a fast Fourier 
transform (FFT). Under the assumption that the background noise is long-time stationary 
(in comparison with the speech), a model of the background noise is estimated during 
time periods of non-speech activity whereby the measured spectral energy of the received 
signal is attributed to noise. The background noise estimate for each fi:equency bin is 
utilized to estimate an SNR of the speech in the bin. Then, each frequency bin is 
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attenuated according to its noise energy content with a respective gain factor computed 
based on that bin's SNR. 

[ 1 04] The spectral subtraction technique is generally effective at suppressing 
stationary noise components. However, due to the time-variant nature of the noisy 
environment (e.g., street, airport, restaurant, and so on), the models estimated in the 
conventional manner using a single microphone are likely to differ from actuality. This 
may result in an output speech signal having a combination of low audible quality, 
insufficient reduction of the noise, and/or injected artifacts. 

[105] Another technique for suppressing noise is with a microphone array. For 
this technique, multiple microphones are arranged typically in a linear or some other type 
of array. An adaptive or non-adaptive method is then used to process the signals received 
from the microphones to suppress noise and improve speech SNR. However, the 
microphone array has not seen being applied to mobile communication devices since it 
generally require certain size that cannot be fit into the small form factor of current 
mobile devices. 

[106] Conventional wireless conrniunication devices such as cellular phones 
typically utilize a single microphone to pick up speech signal. The single microphone 
design limits the type of signal processing that may be performed on the received signal, 
and may further limit the amount of improvement (i.e., the amount of noise suppression) 
that may be achievable. The single microphone design is also ineffective at suppressing 
noise in far-field application where the microphone is placed at a distance (e.g., a few 
feet) away from the speech source. 

[107] As can be seen, techniques that can be used to suppress noise in a speech 
signal in a wireless environment are highly desirable. 

SUMMARY 

[ 1 08] The invention provides techniques to suppress noise from a signal 
comprised of speech plus noise. In accordance with aspects of the invention, two or more 
signal detectors (e.g., microphones) are used to detect respective signals. Each detected 
signal comprises a desired speech component and an imdesired noise component, with the 
magnitude of each component being dependent on various factors such as the distance 
between the speech source and the microphone, the directivity of the microphone, the 
noise sources, and so on. Signal processing is then used to process the detected signals to 
generate the desired output signal having predominantly speech, witii a large portion of 
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the noise removed. The techniques described herein may be advantageously used for 
both near-field and far-field applications, and may be implemented in various wireless 
and mobile devices such as cellular phones. 

[ 1 09] An embodiment of the invention provides a mobile communication device 
5 that includes a number of signal detectors (e.g., two microphones), optional first and 
second beam forming units, and a noise suppression unit. The beam forming units and 
noise suppression unit may be implemented within a digital signal processor (DSP). Each 
signal detector provides a respective detected signal having a desired component plus an 
undesired component. The first beam forming unit receives and processes the detected 
10 signals to provide a first signal s(t) having the desired component plus a portion of the 
undesired component. The second beam forming vmit receives and processes the detected 
signals to provide a second signal x(t) having a large portion of the undesired component. 
g| The noise suppression unit then receives and digitally processes the first and second 

.^^'^ signals to provide an output signal y(t) having substantially the desired component and a 

CI 

p J 15 large portion of the undesired component removed. The noise suppression unit may be 

'& 

: designed to digitally process the first and second signals in the frequency domain, 

n although signal processing in the time domain is also possible. The noise suppression 

umt may be designed to perform the noise cancellation using spectrum modification 
N technique, which provides improved performance over other noise cancellation 

20 techniques. 

fy [ 1 1 0] In one specific design, the noise suppression unit includes a noise 

spectrum estimator, a gain calculation unit, a speech or voice activity detector, and a 
multiplier. The noise spectrum estimator derives an estimate ofthe spectrum of the noise 
based on a transformed representation of the second signal. The gain calculation unit 

25 provides a set ofgain coefficients for tiie multiplier based on a transformed representation 
ofthe first signal and the noise spectrum estimate. The multiplier receives and scales the 
magnitude of the transformed first signal with the set of gain coefficients to provide a 
scaled transformed signal, which is then inverse transformed to provide the output signal. 
The activity detector provides a control signal indicative of active and non-active time 

30 periods, with the active time periods indicating that the first signal includes 

predominantly the desired component. The first beam forming unit may be allowed to 
adapt during the active time periods, and the second beam forming unit may be allowed to 
adapt during the non-active time periods. 



3 



Attorney Docket No.: 122-1.1 



[111] Another aspect of the invention provides a wireless communication device, 
e.g., a mobile phone, having at least two microphones and a signal processor. Each 
microphone detects and provides a respective detected signal comprised of a desired 
component and an undesired component. For each detected signal, the specific amount of 
each (desired and undesired) component included in the detected signal may be dependent 
on various factors, such as the distance to the speaking source and the directivity of the 
microphone. The signal processor receives and digitally processes the detected signals to 
provide an output signal having substantially the desired component and a large portion 
of the undesired component removed. The signal processing may be performed in a 
manner that is dependent in part on the characteristics of the detected signals. 

[112] Various other aspects, embodiments, and features of the invention are also 
provided, as described in further detail below. 

[113] The foregoing, together with other aspects of this invention, will become 
more apparent when referring to the following specification, claims, and accompanying 
drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[1 14] FIGS. lA through IC are diagrams of three wireless conmiunication 
devices capable of implementing various aspects of the invention; 

[115] FIG. 2 is a block diagram of a speech processing system suitable for 
removing background noise from a speech plus noise signal, and may be used for both 
near-field and far-field applications; 

[116] FIGS. 3A and 3B are block diagrams of an embodiment of a main beam 
forming unit and a blocking beam forming unit, respectively; 

[1 17] FIGS. 4, 5, and 6 are block diagrams of three different embodiments of the 
noise suppression unit; and 

[118] FIGS. 7 A and 7B are diagrams of another speech processing system 
suitable for removing background noise from a speech plus noise signal. 

DESCRIPTION OF THE SPECIFIC EMBODIMENTS 

[119] FIG. 1 A is a diagram of an embodiment of a wireless communication 
device 100a capable of implementing various aspects of the invention. In this 
embodiment, device 100a is a cellular phone having a pair of microphones 1 10a and 
1 10b. Microphone 1 10a is located in the lower left comer of the device, and microphone 
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11 Ob is located in the lower right comer of the device. The microphones may also be 
located in other parts of the device, and this is within the scope of the invention. The 
placement of the microphones may be constrained by various factors such as the small 
size of the cellular phone, manufacturability, and so on. 
5 [120] FIG. IB is a diagram of an embodiment of a wireless communication 

device 100b having three microphones 1 10. In this embodiment, microphone 1 10a is 
located in the lower center of the device near a speaking user's mouth and may be used to 
pick up desired speech plus undesired background noise. Microphone 1 10b is located in 
the middle left side of the device, and microphone 1 10c is located in the middle right side 
10 of the device. Additional microphones may also be used, and the microphones may also 
be placed in other parts of the device, and this is within the scope of the invention. The 
-^^ microphones do not need to be placed in an array. For improved performance, the 

O microphones may be located as far away fiom each other as practically possible. 

€ J [121] FIG. 1 C is a diagram of an embodiment of a wireless communication 

I J 15 device 100c having a number of microphones 1 10. In this embodiment, device 1 10c 
£p includes a larger sized display, which may be used for displaying text, graphics, videos, 

^ and so on. Device 100c may be a handset for the new 3 generation (3GPP) wireless 

•« 

{| communication systems under development and deployment. Device 100c may also be a 

personal digital assistant (PDA) with voice recognition or phone function. Device 100c 
P j 20 may also be a video phone with or without web-browser capability. In general, device 
1 00c may be any device capable of supporting voice communication possibly along with 



other functions (e.g., text, video, and so on). In the specific embodiment shown in FIG. 

1 C, microphones 1 1 Oa through 1 1 Od are located in a line above the display area. The 

microphones may also be placed in other locations of the device. 
25 [122] Each of devices 100a, 100b, and 100c advantageously employ two or more 

microphones to allow the device to be used for both "near-field'* and "far-field" 

applications. For near-field application, one microphone (e.g., microphone 1 10a in FIG. 

IB) or multiple microphones (e.g., microphones 11 Oa and 1 10b in FIG. 1 A) may be used 

to pick up speech signal from a close-by source. And for far-field application, the 
30 microphones are designed to pick up speech signal from a source located further away. 

Noise suppression is used to remove noise and improve signal quality. 

[123] Devices 1 10a and 1 10b are similar to conventional cellular phones and 

may be used with the devices placed close to the speaking user. With the noise 

suppression techniques described herein, devices 1 10a and 1 10b may also be used in a 
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hand-free mode whereby they are located further away from Device 
1 1 Oc is a handset that may be designed to be placed away from the user (e.g., one to two 
feet away) during use, which allows the user to better view the display while talking. 

[124] FIG. 2 is a block diagram of a speech processing system 200 capable of 
5 removing background noise from a speech plus noise signal and utilizing a number of 
signal detectors, hi an embodiment, microphones are used as the signal detectors. 
System 200 may be used for both near-field and far-field applications, and may be 
implemented in each of devices 100a through 100c in FIGS. 1 A through IC, respectively. 
[125] System 200 includes two or more microphones 2 1 Oa through 2 1 On, a beam 
10 forming unit 212, and a noise suppression unit 230a. Beam forming unit 212 maybe 
optional for some devices (e.g., for devices that use directional microphones), as 
|.|. described below. Beam forming unit 212 and a noise suppression imit 23 Oa may be 

miplemented within one or more digital signal processors (DSPs) or some other 



%. integrated circuit. 

m 

15 [126] Each microphone provides a respective analog signal that is typically 



conditioned (e.g., filtered and amplified) and then digitized prior to being subjected to the 
signal processing by beam forming unit 212 and noise suppression unit 230a. For 
simplicity, this conditioning and digitization circuitry is not shown in FIG. 2. 

[127] The microphones may be located either close to, or at a relatively far 
|j 20 distance away from, the speaking user during use. Each microphone 210 detects a 
pi respective signal having a speech component plus a noise component, with the magnitude 

of the received components being dependent on various factors, such as (1) the distance 
between the microphone and the speech source, (2) the directivity of the microphone 
(e.g., whether the microphone is directional or omni-directional), and so on. The detected 
25 signals from microphones 210a through 210n are provided to each of two beam forming 
units 214a and 214b within unit 212. 

[128] Main beam forming unit 214a, which is also referred to as the "main beam 
former", processes the signals from microphones 210a through 21 On to provide a signal 
s(t) comprised of speech plus noise. Main beam forming unit 214a may further be able to 
30 suppress a portion of the received noise component. Main beam forming unit 214a may 
be designed to implement any type of beam former that attempts to reject as much 
interference and noise as possible. A specific design for main beam forming unit 214a is 
shown in FIG. 3A below. Main beam forming unit 214a may also be an optional unit that 
may be omitted for some devices (e.g., if the signal s(t) can be obtained from one 
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microphone). Main beam forming unit 214a provides the signal s(t) to noise suppression 
unit 230a. 

[129] Blockmg beam forming unit 214b, which is also referred to as a "blocking 
beam former", processes the signals from microphones 210a through 21 On to provide a 
signal x{t) comprised of mostly the noise component. Blocking beam forming unit 214b 
is used to provide an accurate estimate of the noise, and to block as much of the desired 
speech signal as possible. This then allows for effective cancellation of the noise in the 
signal s(t). Blocking beam forming unit 214b may also be designed to implement any one 
of a number of beam formers, one of which is shown in FIG. 3B below. Blocking beam 
forming unit 214b provides the signal x(t) to noise suppression unit 230a. By employing 
blockmg beam forming unit 214b to generate the mostly noise signal x(t% system 200 
may utilize various types of microphone (e.g., omni-directional microphone, dipole 
microphones, and so on) which may pick up any combination of signal and noise. 

[130] A beam forming controller 218 directs the operation of main and blocking 
beam forming units 214a and 214b. Controller 218 typically receives a control signal 
from a voice activity detector (VAD) 240. Voice activity detector 240 detects the 
presence of speech at the microphones and provides the Act control signal indicating 
periods of speech activity. The detection of speech activity can be performed in various 
manners known in the art, one of which is described by D.K. Freeman et al in a paper 
entitled "The Voice Activity Detector for the Pan-European Digital Cellular Mobile 
Telephone Service," 1989 IEEE hitemational Conference Acoustics, Speech and Signal 
Processing, Glasgow, Scotland, March 23-26, 1989, pages 369-372, which is incorporated 
herein by reference. 

[131] Beam forming controller 218 provides the necessary controls that direct 
main and blocking beam forming units 214a and 214b to adapt at the appropriate times. 
In particular, controller 218 provides an Adapt M control signal to main beam forming 
unit 214a to enable it to adapt during periods of speech activity and an Adapt_B control 
signal to blocking beam forming unit 214b to enable it to adapt during periods of non- 
speech activity. In one simple implementation, the Adapt_B control signal is generated 
by inverting the Adapt_M control signal. 

[132] FIG. 3 A is a block diagram of an embodiment of main beam forming unit 
214a. The signal from microphone 210a is provided to a delay element 312 and the 
signals from microphones 210b through 21 On are respectively provided to adaptive filters 
3 14b through 3 14n. Delay element 312 provides delay for the signal from microphone 
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210a such that the delayed signal is approximately time-aligned with the outputs from 
adaptive filters 3 14b thorough 3 14n. The amount of delay to be provided by delay element 
3 12 is thus dependent on the design of adaptive filters 314. One particular delay length 
may be a half of the tap number of the adaptive filters, if a finite impulse response (FER) 
adaptive filter is used for each adaptive filter. 

[133] Each adaptive filter 3 14 filters the received signal such that the error signal 
e(t) used to update the adaptive filter is minimized during the adaptation period. 
Adaptive filters 3 14 may be designed to implement any one of a number of adaptation 
algorithms known in the art. Some such algorithms include a least mean square (LMS) 
algorithm, a normalized mean square (NLMS), a recursive least square (RLS) algorithm, 
and a direct matrix inversion (DMI) algorithm. Each of the LMS, NLMS, RLS, and DMI 
algorithms (directly or indkectly) attempts to minimize the mean square error (MSE) of 
the error signal e(t) used to update the adaptive filter. In an embodiment, the adaptation 
algorithm implemented by adaptive filters 3 14b through 3 14n is the NLMS algorithm. 

[ 1 34] The NLMS algorithm is described in detail by B. Widrow and S .D. Stems 
in a book entitled "Adaptive Signal Processing," Prentice-Hall Inc., Englewood Cliffs, 
N.J., 1986. The LMS, NLMS, RLS, DMI, and other adaptation algorithms are also 
described in detail by Simon Haykin in a book entitled "Adaptive Filter Theory", 3rd 
edition, Prentice Hall, 1996. The pertinent sections of these books are incorporated 
herein by reference. 

[135] As shown in FIG. 3 A, the filtered signal from each adaptive filter 3 1 4 is 
subtracted by the delayed signal from delay element 3 1 2 by a respective summer 3 1 6 to 
provide the error signal e{t) for that adaptive filter. This error signal is then provided 
back to the adaptive filter and used to update the response of that adaptive filter. As also 
shown in FIG. 3 A, adaptive filters 3 14b through 3 14n are updated when the Adapt_M 
control signal is enabled, and are maintained when the Adapt_M control signal is 
disabled. 

[136] To generate the signal s(t\ a summer 318 receives and combines the 
delayed signal from microphone 210a with the filtered signals from adaptive filters 314b 
through 3 14n. The resultant output may further be divided by a factor of Nmic (where 
Nmic denotes the number of microphones) to provide the signal s{t), 

[137] FIG. 3 A shows a specific design for main beam forming unit 214a. Other 
designs may also be used and are within the scope of the invention. For example, main 
beam forming unit 214a may be implemented with a "Griffiths- Jim" beam former that is 
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described by LJ. Griffiths and C.W. Jim in a paper entitled "An Alternative Approach to 
Robust Adaptive Beam Forming," IEEE Trans. Antenna Propagation, January 1982, vol. 
AP-30, no. 1, pp. 27-34, which is incorporated herein by reference. 

[138] FIG. 3B is a block diagram of an embodiment of blocking beam forming 
imit 214b. The signal from microphone 210a is provided to a delay element 322 and the 
signals from microphones 210b through 21 On are respectively provided to adaptive filters 
324b through 324n. Delay element 322 provides an amount of delay approximately 
matching the delay of adaptive filters 324. One particular delay length may be a half of 
the tap number of the adaptive filter, if a FIR filter is used for each adaptive filter. 

[139] Each adaptive filter 324 filters the received signal such that an error signal 
e{t) is minimized during the adaptation period. Adaptive filters 324 also may be 
implemented using various designs, such as with NLMS adaptive filters. To generate the 
signal x(t\ a summer 328 receives and subtracts the filtered signals from adaptive filters 
324b through 324n from the delay signal from delay element 322. The signal x{t) 
represents the common error signal for all adaptive filters 324b through 324n within the 
blocking beam former, and is used to adjust the response of these adaptive filters. 

[140] Referring back to FIG. 2, noise suppressor 230a performs noise 
suppression in the frequency domain. Frequency domain processing may provide 
improved noise suppression and may be preferred over time domain processing because 
of superior performance. The mostly noise signal x(t) does not need to be highly 
correlated to the noise component in the speech plus noise signal s{t)y and only need to be 
correlated in the power spectrum, which is a much more relaxed criteria. 

[141] Within noise suppressor 230a, the speech plus noise signal s(t) from main 
beam forming unit 214a is transformed by a transformer 232a to provide a transformed 
speech plus noise signal S(a)). In an embodiment, the signal s(t) is transformed one block 
at a time, with each block including L data samples for the signal s(tX to provide a 
corresponding transformed block. Each transformed block of the signal S((o) includes L 
elements, Snicoo) through Sn(o)LA)i> corresponding to L frequency bins, where n denotes the 
time instant associated with the transformed block. Similarly, the mostly noise signal x(t) 
from blocking beam forming unit 214b is transformed by a transformer 232b to provide a 
transformed mostly noise signal X(co). Each transformed block of the signal X(co) also 
includes L elements, X„(co^) through In the specific embodiment shown in FIG. 

2, transformers 232a and 232b are each implemented as a fast Fourier transform (FFT) 



Attorney Docket No.: 122-1.1 



that transforais a time-domain representation into a frequency-domain representation. 

Other type of transform may also be used, and this is within the scope of the invention. 

The size of the digitized data block for the signals s(t) and x(t) to be transformed can be 

selected based on a number of considerations (e.g., computational complexity). In an 
5 embodiment, blocks of 128 samples at the typical audio sampling rate are transformed, 

although other block sizes may also be used. In an embodiment, the samples in each 

block are multipUed by a Hamiing window function, and there is a 64-sample overlap 

between each pair of consecutive blocks. - 

[142] The magnitude component of the transformed signal S(co) is provided to a 
10 multiplier 236 and a noise spectrum estimator 242. Multiplier 236 scales the magnitude 

component of S{a)) with a set of gain coefficients G{a)) provided by a gain calculation 
^.h unit 244. The scaled magnitude component is then recombined with the phase component 

Il of S(o)) and provided to an inverse FFT (IFFT) 238, which transforms the recombined 

"^^^ signal back to the time domain. The resultant output signal y(0 i^^l^des predominantly 

p J 1 5 speech and has a large portion of the background noise removed. 
ff [143] It is sometime advantageous, though it may not be necessary, to filter the 

# magnitude component of S{o)) and X((o) so that a better estimation of the short-term 

1^1 spectrum magnitude of the respective signal can be obtained. One particular filter 

implementation is a first-order infinite impulse response (IIR) low-pass filter with 
p 20 different attack and release time. 

ftl [144] Noise spectrum estimator 242 receives the magnitude of the transformed 

signal S(q)), the magnitude of the transformed signal X(a})y and the Act control signal 
from voice activity detector 240 indicative of periods of non-speech activity. Noise 
spectrum estimator 242 then derives the magnitude spectrum estimates for the noise N(o)), 
25 as follows: 

\N{co)\=Wico}\X((o)\ , Eq(l) 

where W{(ji)) is referred to as the channel equalization coefficient. In an embodiment, 
this coefficient may be derived based on an exponential average of the ratio of magnitude 
of S{a)) to the magnitude of X{co), as follows: 

30 )^^^^(^) = aPF„(^) + (l-a)f^^ Eq(2) 
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where a is the time constant for the exponential averaging andisO<a<l.Ina specific 
implementation, a-\ when voice activity indicator 240 indicates a speech activity 
period and a = 0.98 when voice activity indicator 240 indicates a non-speech activity 
period. 

[145] Noise spectrum estimator 242 provides the magnitude spectrum estimates 
for the noise N{a)) to gain calculator 334, which then uses these estimates to generate the 
gain coefficients G(a)) for multiplier 334. 

[146] With the magnitude spectrum of the noise \N(q))\ and the magnitude 
spectrum of the signal 15(^)1 available, a number of spectrum modification techniques 
may be used to determine the gain coefficients G((d), Such spectrum modification 
techniques include a spectrum subtraction technique, Weiner filtering, and so on. 

[147] In an embodiment, the spectrum subtraction technique is used for noise 
suppression, and the gain coefficients G{(d) may be determined by first computing the 
SNR of the speech plus noise signal S((o) and the mostly noise signal iV(<»), as follows: 



where Gmin is a lower bound on G(fi?). 

[148] Gain calculator 244 thus generates a gain coefficient G(fi^) for each 
fi-equency bin j of the transformed signal S(6)). The gain coefficients for all frequency 
bins are provided to multiplier 236 and used to scale the magnitude of the signal S{co). 

[149] In an aspect, the spectrum subtraction is performed based on a noise N(q)) 
that is a time- varying noise spectrum derived fi*om the mostly noise signal x(t), which 
may be provided by the blocking beam former. This is different firom the spectrum 
subtraction used in conventional single microphone design whereby N(a)) typically 
comprises mostly stationary or constant values. This type of noise suppression is also 
described in U.S. Patent No. 5,943,429, entitled "Spectral Subtraction Noise Suppression 
Method," issued August 24, 1999, which is incorporated herein by reference. The use of 



SNRio)) = 



1^(^)1 
\Nico)\ 



Eq(3) 



The gain coefficient G{6)) for each frequency bin co may then be expressed as: 




Eq(4) 
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a time-varying noise spectrum (which more accurately reflects the real noise in the 
environment) allows the inventive noise suppression techniques to cancel non-stationary 
noise as well as stationary noise (non-stationary noise cancellation typically cannot be 
achieve by conventional noise suppression techniques that use a static noise spectrum). 

[ 1 50] The spectrum subtraction technique for a single microphone is also 
described by S.F. Boll in a paper entitled "Suppression of Acoustic Noise in Speech 
Using Spectral Subtraction," IEEE Trans. Acoustic Speech Signal Proc, April 1979, vol. 
ASSP-27, pp. 1 1 3-121 , which is incorporated herein by reference. 

[151] The spectrum modification technique is one technique for removing noise 
from the speech plus noise signal ^(0- The spectrum modification technique provides 
good performance and can remove both stationary and non-stationary noise (using the 
time-varying noise spectrum estimate described above). However, other noise 
suppression techniques may also be used to remove noise, some of which are described 
below, and this is within the scope of the invention. 

[1 52] The noise suppression technique shown in FIGS. 2, 3 A, and 3B provides 
good result even for wireless devices having small form factor. In general, it is desirable 
to maintain the size of the wireless devices to be as small as possible because of their 
portable nature. However, the small form factor also results in the microphones being 
located relatively close to each other (i.e., a small array). Conventional beam forming 
and noise suppression techniques generally cannot achieve good result for diffused noise 
source (i.e., not a direct noise source) based on a small array. In contrast, tiie noise 
suppression technique described herein can achieve good result even for a small array by 
employing the blocking beam former to derive the mostly noise signal x(t) on a second 
channel, and further using spectrum modification to cancel stationary and non-stationary 
noise. 

[153] FIG. 4 is a block diagram of a noise suppression unit 230b capable of 
removing background noise from a speech plus noise signal. Noise suppression unit 230b 
achieves the noise reduction/suppression in the time-domain. 

[1 54] Within noise suppression unit 230b, the speech plus noise signal s(t) is 
filtered by a pre-filter 432 to remove high frequency components, and the filtered speech 
plus noise signal is provided to a voice activity detector 440 and a summer 434. The 
mostly noise signal x(t) is provided to an adaptive filter 450, which filters the noise with a 
particular transfer function h{t). The filtered noise p(t) is then provided to summer 434 
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and subtracted from the filtered speech plus noise signal to provide an intermediate signal 
d{t) having predominantly speech and some amount of noise. 

[155] Adaptive filter 450 may be implemented with a "base" filter operating in 
conjunction with an adaptation algorithm (not shown in FIG. 4 for simplicity). The base 
filter may be implemented as a finite impulse response (FIR) filter, an infinite impulse 
response (IIR) filter, or some other filter type. The characteristics (i.e., the transfer 
function) of the base filter is determined by, and may be adjusted by manipulating, the 
coefficients of the filter. In an embodiment, the base filter is a linear filter, and the 
filtered noise h{t) is a linear fimction of the received noise x(t). In other embodiments, the 
base filter may implement a non-linear transfer fimction, and this is within the scope of 
the invention. 

[156] In an embodiment, the base filter is adapted during periods of non-speech 
activity. Voice activity detector 440 detects the presence of speech activity on the speech 
plus noise signal s(t) and provides a control signal that enables the adaptation of the 
coefficients of the base filter when no speech activity is detected. The adaptation 
algorithm can be implemented with any one of a number of algorithms such as the LMS, 
NLMS, RLS, DMI, and some other algorithms. 

[ 1 57] The base filter within adaptive filter 450 is adapted to implement (or 
approximate) the transfer fimction h(t), which describes the correlation between the noise 
components received on the signals s(t) and x{t). The base filter then filters the mostly 
noise signal x(t) with the transfer fimction h(t) to provide ilie filtered noise p(t), which is 
an estimate of the noise in the signal s{i). The estimated noise p(t) is then subtracted from 
the speech plus noise signal s(t) by sunmier 434 to generate the intermediate signal d(t). 
During periods of non-speech activity, the signal s(t) includes predominantly noise, and 
the intermediate signal d{t) represents the error between the noise received on the signal 
s{t) and the estimated noise p(t). The error signal d(t) is then provided to the adaptation 
algorithm within adaptive filter 450, which then adjusts the transfer fimction h(t) of the 
base filter to minimize the error. 

[158] In an embodiment, a spectrum subtraction unit 460 is used to fiirther 
suppress noise components in the intermediate signal d{t) to provide the output signal >;(^) 
having predominantly speech and a larger portion (or most) of the noise removed. 
Spectrum subtraction unit 460 can be implemented as described above for noise 
suppression unit 230a. 
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[159] FIG. 5 is a block diagram of a noise suppression unit 230c, which is also 
capable of removing background noise from a speech plus noise signal. Noise 
suppression unit 230c achieves the noise reduction in the frequency-domain. 

[ 1 60] Within noise suppression unit 230c, the speech plus noise signal s(t) is 
transformed by a fast Fourier transformer (FFT) 532a, and the mostly noise signal x{t) is 
similarly transformed by a FFT 532b. Various other types of signal transform may also 
be used, and this is within the scope of the invention. 

[161] The transformed speech plus noise signal Sico) is provided to a voice 
activity detector 540 and a sunmier 534. The transformed noise signal is provided 
to an adaptive filter 550, which filters the noise with a particular transfer function H(a)). 
The filtered noise P(a)) is then provided to summer 534 and subtracted from the 
transformed speech plus noise S((d) to provide an intermediate signal D(o)) that includes 
the speech component and has much of the low frequency noise component removed. 

[162] Adaptive filter 550 includes a base filter operating in conjunction with an 
adaptation algorithm. The base filter is adapted during periods of non-speech activity, as 
indicated by a control signal from voice activity detector 540. The adaptation may be 
achieved, for example, via an LMS algorithm. The base filter then filters the transformed 
noise X(6)) with the transfer fimction H{a)) to provide an estimate of the noise on the 
signal S{(d), 

[163] The noise components received on the signals S(o)) and X(o)) may be 
correlated. The degree of correlation determines the theoretical upper bound on how 
much noise can be cancelled using linear adaptive filter such as in block 420 and 550. A 
coherent fimction C(6)), which is indicative of the amount of statistical correlation 
between the two noise components, may be expressed as: 

^ ^ E{\X{a>)\}-E{\Sio>)\} 

where X{<v) is the noise received on the signal x(t% S{o}) is representative of the noise 
received on the signal s(t), and E is the expectation operation. C{g)) is equal to zero (0.0) 
ifX(a>) and S{a)) are totally uncorrelated, and is equal to one (1 .0) ifX{o)) and S{co) are 
totally correlated. In the designs described above, the linear adaptive filter (such as the 
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ones in blocks 420 and 550) can cancel the correlated noise components while flie 
spectrum modification technique further suppresses un-correlated portion of the noise. 

[ 1 64] The magnitude component of the intermediate signal D(o)) is then 
provided to a noise spectrum estimator 542 and a multipUer 536. The operation of blocks 
542 and 544 is similar to that of blocks 242 and 244, respectively, which have been 
described above. 

[165] FIG. 6 is a block diagram of a noise suppression unit 230d that is also 
capable of removing background noise from a speech plus noise signal. Noise 
suppression unit 230d also achieves the noise reduction in the frequency domain, and may 
be used even if the noise components received by the two signals s(t) and x{t) are related 
by a non-linear function. In particular, noise suppression unit 230d is capable of 
removing deterministic noise component from the speech plus noise signal s(t). 

[ 1 66] Within noise suppression unit 2304 the speech plus noise signal s{i) is 
transformed (e.g., to the frequency domain) by an FFT 632a, and the mostly noise signal 
x{i) is similarly transformed by an FFT 632b. The magnitude component of the 
transformed speech plus noise signal S(o)) is provided to a voice activity detector 640 and 
a summer 634. The magnitude component of the transformed noise signal X{o)) is 
provided to an adaptive filter 650, which filters the noise with a particular transfer 
function H((jd), The filtered noise P{£o) is then provided to summer 634 and subtracted 
from the magnitude component of the transformed speech plus noise S((d) to provide the 
magnitude component for an intermediate signal D(ai) having predominantly speech and 
a large portion of the low frequency noise removed. 

[167] Adaptive filter 650 includes a base filter operating in conjunction with an 
adaptation algorithm. The base filter is adapted during periods of non-speech activity, as 
indicated by a control signal from voice activity detector 640. Again, the adaptation may 
be achieved via an LMS algorithm or some other algorithm. The base filter then filters 
the transformed noise with the transfer fimction H(o)) to provide an estimate of the noise 
received on the signal S(a)). 

[168] The transfer function of the base filter may be a linear or non-linear 
function. A linear transfer function may be implemented similar to that described above 
for FIG. 5. In an embodiment, a non-linear transfer function may be implemented as 
follows: 
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Will 



P = HX , Eq(6) 

where P is a vector of L transformed elements for the estimated noise (i.e., P„ (Oq) 
through (co^_^ ) ), Z is a vector of L transformed elements for the mostly noise signal 
x{t) (i.e., JSr„ (^^0 ) through (eo^.^ ) ), and ^ is a matrix of the transfer function for the 
base filter. Each estimated element, P„ (cOj ) , at time n for firequency biny can be 
expressed as: 

= i/,(0,7>Z„(a>o) + ^«aj)-^«(^i) + - + ^«(i-1^7>^«K^^ 

where J = 0, 1, . . . L-1 . Thus, for this specific transfer function, each estimated element 
Pn i^j ) is a linear combination of the L elements of the noise X„ (co) weighted by 



01' 

pj 10 H„(a)). 

CI 

[169] Othernon-lineartransfer functions may also be used and are within the 
scope of the invention. 



p j [ 1 70] In the embodiment shown in FIG. 6, additional signal processing is 



performed on the intermediate signal D(co) to remove higher frequency noise component. 

1 5 The magnitude component of the intermediate signal D{6)) is provided to a noise 

spectrum estimator 642 and a multiplier 636. Noise spectrum estimator 642 also receives 
the control signal from voice activity detector 640 indicative of periods of speech and 
non-speech activity, and estimates the spectrum or power spectral density (PSD) of each 
of the speech and noise components based on the magnitude of the signal D((o). The PSD 

20 estimates for the speech and noise are provided to a gain calculation unit 644. Again, the 
speech and noise PSD estimates can be performed as described above and in the 
aforementioned U.S. Patent No. 5,943,429. 

[171] Gain calculation unit 644 generates a scaling factor for each frequency bin 
of the intermediate signal D(o)). The scaling factors for all frequency bins can be 

25 generated in the manner described above and in the aforementioned U.S. Patent No. 

5,943,429. The scaling factors are then provided to multiplier 636 and used to scale the 
magnitude of the intermediate signal D(a)). The scaled magnitude component is 
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recombined with the phase component and provided to an inverse FFT (IFFT) 638, which 
transforms the recombined signal back to the time domain. The resultant output signal 
y(t) from IFFT 638 includes predominantly speech and has a larger portion of the noise 
removed. Again, most of the deterministic noise component can be removed by noise 
suppression unit 230d. 

[172] Other signal processing schemes may be used to process the speech plus 
noise signal s(t) and the mostly noise signal x{t) to provide the desired output signal XO 
having mostly speech and a large portion of the noise removed. These various signal 
processing schemes are also within the scope of the invention. 

[173] If beam forming units are used as shown in FIG. 2, then various types of 
microphones can be supported. The processing to derive the speech plus noise signal s(i) 
and the mostly noise signal x(t) may be performed by the main and blocking beam 
formers, respectively, as described above in FIG. 2. However, the signals s(t) and x(t) 
may also be derived without the use of the beam formers, as described below. 

[174] FIG. 7 A is a block diagram of a speech processing system 700 suitable for 
removing background noise from a speech plus noise signal, and may also be used for 
both near-field and far-field applications. Within system 700, speech plus noise is 
received via a first microphone 710a, and mostly noise is received via a second 
microphone 710b. Microphone 710a thus receives the desired speech from a speaking 
user and the undesired background noise from the enviroimient. Microphone 710b is 
configured to detect mostly the noise component to be suppressed from the signal 
received by microphone 710a. 

[ 1 75] FIG. 7B is a diagram that illustrates a simple configuration of two dipole 
microphones used to derive the signals s(t) and x(ty The ability to pick up signal plus 
noise or mostly noise may be achieved by proper placement of the microphones and/or 
use of certain types of microphones. For example, microphone 710a may be located on 
the device such that it is close to the mouth during use (e.g., microphone 1 10b in FIG. 
IB), in which case the speech component is typically larger than the noise component. 
Conversely, microphone 710b may be located such that the noise component is larger 
than the speech component. 

[ 1 76] Microphones 710a and 7 1 Ob may also be implemented with dipole 
microphones (or pressure gradient microphones). A dipole microphone has two main 
"lobes" and can pick up signal from both the front and back but not the side (its nulls). If 
the direction of speech is known or fixed, then microphone 710a may be placed on the 
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device such that its main lobe pomts toward the direction of the speech so that mostly 
speech is picked up by the microphone, as shown in FIG. 7B. Conversely, microphone 
710b may be placed such that its null points toward the direction of speech so that little 
speech is picked up by the microphone, as also shown in FIG. 7B. 

[ 1 77] Referring back to FIG. 7 A, microphone 710a provides the signal s(t) 
comprised of the signal plus noise, and microphone 710b provides the signal x(t) 
comprised of mostly the noise component. For this microphone configuration, the main 
and blocking beam forming units are not needed to generate s(t) and x(t), respectively. 

[178] The speech and noise signal s(t) from microphone 710a and the mostly 
noise signal x(t) from microphone 710b are provided to a signal processing unit 720, 
which processes the signals s(t) and x(t) to provide an output signal y{t) that includes 
mostly speech. Signal processing unit 720 may be designed to implement noise 
suppression unit 230a, 230b, 230c, or 230d, or some other noise suppressor design. A 
memory 730 may be used to provide storage for data and/or program codes used by signal 
processor 720. 

[179] As noted above, any number of microphones (i.e., greater than one) may 
be used (in combination with noise suppression) to generate the desired output signal. 
The embodiments shown in FIGS. 1 A through IC are illustrative, and greater or fewer 
number of microphones may be used. 

[1 80] Digital signal processing is used herein to process the signals from the 
microphones to generate the desired output signal. The use of digital signal processing 
allows for the easy implementation of (1) various algorithms (e.g., the NLMS algorithm) 
used for the signal processing, (2) the processing of the signals in the frequency-domain, 
which may provide improved performance, (3) and other advantages. 

[181] The signal processing described herein (especially the embodiment FIG. 2) 
may be used to provide the desired output signal for both near-field and far-field 
appUcations. For far-field applications, adaptive beam forming may be used to obtain the 
speech plus noise signal s{t) and the mostly noise signal x(t). Beam forming may also be 
used for near-field application. For certain microphone configurations (such as that 
shown in FIG. 7A), the signals from the microphones may be used directly for the speech 
plus noise signal s{t) and the mostly noise signal x(t). In either case, the same signal 
processing may be used to process the signals s(t) and x(t)y however derived, to adaptively 
determine the noise component, and to suppress this noise component from the speech 
plus noise signal to provide the desired output signal. The ability to support both near- 
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field and far-field applications is especially advantageous for wireless communication 
devices. 

[ 1 82] The noise suppression described herein provides an output signal having 
improved characteristics. A large portion of the noise may be removed from the signal, 
5 which improves the quality of the output signal. The techniques described herein allows 
a user to talk softly even in a noisy environment, which provides privacy and is highly 
desirable. 

[ 1 83] The noise suppression techniques described herein may be implemented 
within a small form factor. The microphones may be placed closed to each other (e.g., 
10 only five centimeters of separation between microphones may be sufficient). Also the 
microphones are not placed in an end-fire type of configuration, i.e., one in which the 
microphones are placed in firont of one another along an axis that is pointed 
approximately toward the sound source. This small form factor allows the noise 
j suppression to be implemented in various types of device such as cellular telephones, 

Ij 15 personal digital assistance (PDAs), tape recorders, telephones, and so on. 

[184] For simplicity, the signal processing systems described above use 
microphones as signal detectors. Other types of signal detectors may also be used to 
detect the desired and undesired components. For certain applications, sensors may be 
%h used to detect other types of noise such as vibration, road noise, motion, and others. 

Il 20 [185] For clarity, the signal processing systems have been described for the 

ij processing of speech. In general, these systems may be used process any signal having a 

desired component and an undesired component. 

[186] The signal processing systems and techniques described herein may be 
implemented in various maimers. For example, these systems and techniques may be 
25 implemented in hardware, software, or a combination thereof. For a hardware 

implementation the signal processing elements (e.g., the beam forming units, noise 
suppression, and so on) may be implemented within one or more application specific 
integrated circuits (ASICs), digital signal processors (DSPs), programmable logic devices 
(PLDs), controllers, microcontrollers, microprocessors, other electronic units designed to 
30 perform the functions described herein, or a combination thereof. For a software 

implementation, the signal processing systems and techniques may be implemented with 
modules (e.g., procedures, functions, and so on) that perform the functions described 
herein. The software codes may be stored in a memory unit (e.g., memory 730 in FIG. 7) 
and executed by a processor (e.g., signal processor 720). The memory unit may be 
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implemented within the processor or external to the processor, in which case it can be 
communicatively coupled to the processor via various means as is known in the art. 

[ 1 87] The foregoing description of the specific embodiments is provided to 
enable any person skilled in the art to make or use the present invention. Various 
modifications to these embodiments will be readily apparent to those skilled in the art, 
and the generic principles defined herein may be applied to other embodiments without 
the use of the inventive faculty. Thus, the present invention is not intended to be limited 
to the embodiments shown herein but is to be accorded the widest scope consistent with 
&e principles and novel features disclosed herein, and as defined by the following claims. 



20 



